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Abstract 

Background: The barcoding of next generation sequencing libraries has become an essential part of the 
experimental design. Barcoding not only allows the sequencing of more than one sample per lane, but also 
reduces technical bias. However, current barcoding strategies impose significant limitations and/or technical barriers 
in their implementation for ChlP-sequencing. 

Findings: Converting Y-shaped sequencing adapters to double stranded DNA prior to agarose gel size selection 
reduces adapter dimer contamination and quantitating the number of cycles required for amplification of the library 
with qPCR prior to library amplification eliminates library over-amplification. 

Conclusions: We describe an efficient and cost effective method for making barcoded ChlP-seq libraries for 
sequencing on the lllumina platform. 
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Findings 

Chromatin immunoprecipitation followed by next-generation 
sequencing (ChlP-seq) is a powerful method to identify 
the genome-wide binding profiles of chromatin-associated 
proteins and has emerged as one of the most import- 
ant tools used to study transcriptional regulation, 
DNA replication, DNA recombination and chromatin 
structure [1]. 

Technological advancements in next-generation sequen- 
cing throughput have facilitated the production of more 
sequencing data per lane on the lllumina platform than 
what is necessary to achieve saturation in a ChlP-seq ex- 
periment. Thus, in order to perform experiments in cost 
effective manner, multiple barcoded ChlP-seq libraries 
must be pooled together and sequenced in a single lane. 
In addition, pooling barcoded samples from the same or 
different experiments reduces technical variability between 
samples [2]. Thus, barcoding not only reduces costs, it 
also produces higher confidence and quality data. Unfor- 
tunately, current barcoding strategies impose significant 
limitations and/or technical barriers to their implementa- 
tion for ChlP-seq analysis. 
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There are currently two main barcoding strategies for 
the generation of NGS (next generation sequencing) li- 
braries for sequencing on the lllumina platform. The first 
embeds the barcode in the adapter oligonucleotide used 
for library construction so that the first nucleotides se- 
quenced corresponds to the barcode sequence [3]. On the 
lllumina platform, it is critical that the four bases (A, T, G 
and C) are represented in roughly equal proportions in 
the first nucleotides sequenced [4], Thus, it is necessary to 
pool libraries in multiples of four so that the nucleotide 
composition is balanced. However, this causes inflexibility 
in that only multiples of four samples must be used, and 
also limits the researcher from pooling libraries in unequal 
ratios if more reads are required from one sample over an- 
other. In addition, the invariant thymidine residue at the 
3' end of the oligonucleotide adapters, which is required 
for ligation to the immunoprecipitated DNA, can have an 
adverse effect on the quality of the sequences generated. 
To address these issues it is usually recommended that 
libraries are spiked with Illuminas PhiX control library 
and/or sequenced at a lower density [5]. However, these 
methods reduce sequencing throughput significantly. 

Because of the difficulties associated with the incorpor- 
ation of the barcode at the beginning of the sequencing 
read, lllumina has implemented a second-read barcoding 
strategy as a centerpiece of its TruSeq technology. The 
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second read strategy circumvents the problems associated 
with having the barcode embedded at the beginning of the 
sequencing read, however current adapter design makes it 
difficult to implement this strategy using DNA from 
ChlP-seq experiments. Specifically, during library prepar- 
ation adapter dimers are formed and must be removed 
from the library before sequencing. Size selection with 
AMPure XP beads is an efficient method to remove 
adapter dimers [5], but since ChIP libraries often contain 
DNA in the size range of 100 to 200 bp, the size difference 
between the adapter dimers and adapter-ligated immuno- 
precipitated DNA is difficult to resolve with this method. 
Size selection by agarose gel electrophoresis, in principle, 
can resolve the size difference between adapter-dimers 
and adapter-ligated immunoprecipitated DNA, but the 
large Y-shaped DNA adapter molecules required for sec- 
ond read barcoding run aberrantly through agarose gels. 



Here, we describe a method that bypasses the problems 
associated with size selection. Specifically, compared to the 
two methods discussed, we perform five cycles of PCR 
prior to size selection on an agarose gel to convert the Y- 
shaped DNA to double-stranded DNA, so that both the 
adapter-dimers and adapter-ligated immunoprecipitated 
DNA run true to their size during agarose gel electrophor- 
esis and accurate size selection can be performed. This step 
also results in a modest amplification of the immunopreci- 
pitated DNA prior to agarose gel size selection, which in- 
creases the yield of an otherwise inefficient step in the 
library preparation protocol. In addition, we also include a 
method to accurately quantitate the number of cycles re- 
quired for PCR amplification of the library, which reduces 
biases caused by over-amplification [6]. Library amplifica- 
tion is performed using Kapa HiFi polymerase, which 
is more efficient and produces less bias than Phusion 
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Figure 1 A schematic representation of the method for making barcoded ChlP-seq libraries. An asterisk denotes novel steps in the 
protocol. (A) End-repair. (B) A-tailing. (C) Adapter ligation. (D) Conversion of Y-shaped DNA to double-stranded DNA. (E) Size selection. (F) Cycle 
quantitation. (G) PCR amplification. 
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polymerase [7]. Finally, we have reduced the amount of re- 
agents required for library construction making library 
construction more affordable, which is especially important 
as sequencing costs drop. 

A schematic step-wise representation of the method is il- 
lustrated in Figure 1. Briefly, a minimum of 2 ng of immu- 
noprecipitated DNA is treated with T4 DNA polymerase, 



Klenow fragment and T4 polynucleotide kinase to blunt 
the DNA ends and add 3' phosphates (Figure 1A). Next, 
single 5' adenine overhangs are generated by incubating 
the DNA with Klenow exo-minus and dATP (Figure IB). 
Barcoded Y-shaped oligonucleotide adapters are ligated to 
the DNA (Figure 1C). The DNA molecules are then con- 
verted to double-stranded DNA with five cycles of PCR 
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Figure 2 Validation of the ChlP-seq library construction method by ChlP-seq against macroH2A1.2. (A) Bioanalyzer 2100 gel image 
showing 1 ul of each library run on DNA 1000 chip (B) Validation of several macroH2A1.2 peaks by qPCR at the respective loci. Three loci not 
bound by macroH2A1 are on the left side (red) and eight loci identified as macroH2A1 peaks are on the right side (green). Loci are named 
according to their location on the mm9 mouse genome build. (C) A UCSC genome browser screen shot of representative macroH2A1.2 peaks. 
Input DNA is colored in blue and the two anti-macroH2A1 .2 ChIP libraries A and B in red. 
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(Figure ID), so that the DNA molecules run predictably 
during the subsequent size selection by agarose gel electro- 
phoresis (Figure IE). The precise number of PCR cycles re- 
quired for amplification is determined by qPCR (Figure IF) 
and finally the library is amplified by PCR accordingly 
(Figure 1G). 

To demonstrate the robustness of our library prepar- 
ation protocol we performed ChIP against the histone 
variant macroH2Al in mouse embryonic fibroblasts 
(MEFs). MacroH2Al is a unique histone variant, in that in 
addition to the canonical histone H2A domain, it also 
contains a relatively large (30 kD) amino-terminal 
'macro' domain. While macroH2Al is generally associ- 
ated with gene repression, it is also required for the acti- 
vation of a subset of genes [8]. The alternative use of a 
64 bp or 73 bp mutually exclusive exon results in the 
production of two distinct proteins, macroH2Al.l and 
macroH2A1.2, respectively [9]. While macroH2Al.l 
efficiently binds the NAD + metabolite ADP-ribose, 
macroH2A1.2 does not [10]. We prepared Illumina li- 
braries from immunoprecipitated material using antibodies 
targeted to macroH2A1.2 as well as input DNA (for a de- 
tailed protocol see Additional file 1). Five ChlP-seq libraries 
were made, two from the macroH2A1.2 immunoprecipi- 
tated DNA (macroH2A1.2 library A and macroH2A1.2 li- 
brary B), two from the input DNA (input library A and 
input library B) and one from input DNA in which the pre- 
agaros gel PCR step was omitted. The libraries were ampli- 
fied with 12 cycles of PCR (including the pre-size selection 
PCR cycles) and visualized on the Agilent Bioanalyzer 
(Figure 2A) before being sequenced in one lane of the 
HiSeq2000. In total, 76 million reads were obtained and 
demultiplexed with Illuminas Casava data analysis pipe- 
line. 24 million, 15 million, 21 million and 15 million reads 
for macroH2A1.2 library A, macroH2A1.2 library B, input 
library A and input library B respectively, were mapped to 
the mouse genome build mm9 with the Bowtie short read 
alignment software. MacroH2A1.2 enriched regions were 
identified with the MACS software package [11] and the 
data sets were uploaded onto the UCSC genome browser. 
In total we identified 33 peaks in macroH2A1.2 library A. 
Every peak identified in macroH2A library A was also a 
peak in macroH2A library B, demonstrating the reprodu- 
cibility of the library construction protocol. An example 
peak is shown in Figure 2C as a UCSC genome browser 
screenshot. A few of the identified peaks were validated by 
ChlP-qPCR (Figure 2B) further supporting the specificity 
and the effectiveness of the method. Adapter sequences 
represented less than 1% of sequences, which highlights 
the effectiveness and necessity to convert the Y-shaped 
adapters to double-stranded DNA prior to agarose gel size 
selection. 

In conclusion, we present a robust and economical 
method for generating barcoded ChlP-seq libraries, which 



has substantial advantages over current methodologies. 
We have successfully pooled between two and eight sam- 
ples in a single lane of the HiSeq2000 and consider that 
any number of samples could be pooled without difficul- 
ties. The libraries are compatible with Illuminas TruSeq 
platform and can be demultiplexed by Illuminas Casava 
data analysis pipeline. Importantly, the two innovations 
detailed in this method, that is the conversion of the Y- 
shaped adapter DNA to double -stranded DNA prior to 
agarose gel size selection and the determination of the 
precise number of cycles to perform during PCR amplifi- 
cation can also be applied to other next-generation se- 
quencing techniques such as MeDIP-seq. 

Additional file 



Additional file 1: Detailed protocol. Chromatin immunoprecipitation 
and multiplexed illumina library preparation protocol. 
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